Add winstress_vm, linstress_vm workloads and WORKLOAD_CONFIG override#1252
Add winstress_vm, linstress_vm workloads and WORKLOAD_CONFIG override#1252ebattat wants to merge 3 commits into
Conversation
- Add winstress_vm: Windows CPU+memory stress test with Python multiprocessing, supports scale via BootstormVM, configurable stress_cpu/stress_memory/stress_duration - Add linstress_vm: Linux Fedora equivalent using cloud-init embedded stress script, supports scale via BootstormVM, same metrics for direct Win/Lin comparison - Add WORKLOAD_CONFIG env var to override template defaults (requests_cpu, requests_memory, sockets, cores, threads) for any workload at runtime - Add pin_node0 to pin logic for node selector support Assisted-by: Claude Code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ebattat The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThe PR adds linstress and winstress VM workload support across environment wiring, template rendering, workload templates, workload runners, and updated golden manifests for the generated Kubernetes and KubeVirt resources. ChangesVM workload support
Estimated code review effort: 5 (Critical) | ~120 minutes Possibly related PRs
Suggested labels: Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Assisted-by: Claude Code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…om linstress Assisted-by: Claude Code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml (1)
9-14: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winRemove the hardcoded VM login credential.
benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yamlstill setspassword: fedorawithssh_pwauth: true; use key-based auth or a per-run secret instead of a fixed, well-known password.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml` around lines 9 - 14, The VM template still hardcodes a known login password in the linstress cloud-init data, so update the relevant template generation path in linstress_vm_template.yaml to remove the fixed password and disable password auth by default. Use key-based authentication or inject a per-run secret through the template parameters instead, and ensure the generated golden file reflects the updated behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml`:
- Around line 9-14: The VM template still hardcodes a known login password in
the linstress cloud-init data, so update the relevant template generation path
in linstress_vm_template.yaml to remove the fixed password and disable password
auth by default. Use key-based authentication or inject a per-run secret through
the template parameters instead, and ensure the generated golden file reflects
the updated behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5db5e8e5-18b9-4126-8efc-8c48195cf61b
📒 Files selected for processing (4)
benchmark_runner/common/template_operations/templates/linstress/linstress_data_template.yamlbenchmark_runner/main/environment_variables.pytests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml
💤 Files with no reviewable changes (1)
- benchmark_runner/common/template_operations/templates/linstress/linstress_data_template.yaml
✅ Files skipped from review due to trivial changes (1)
- tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- benchmark_runner/main/environment_variables.py
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (6)
benchmark_runner/common/template_operations/template_operations.py (1)
156-173: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDuplicate override logic — extract a shared helper.
The new
workload_configblock (Line 165-173) duplicates thehammerdb_configblock (Line 156-163) almost verbatim (filter empties, diff unknown keys, warn, log, merge). Consider factoring this into a single private helper to avoid future divergence between the two override mechanisms.♻️ Suggested refactor
+ `@staticmethod` + def __apply_config_override(config: dict, config_name: str, render_data: dict): + config = {k: v for k, v in config.items() if v != ''} + if not config: + return + unknown_keys = set(config.keys()) - set(render_data.keys()) + if unknown_keys: + logger.warning(f'{config_name} unknown keys (will be ignored): {unknown_keys}') + logger.info(f'{config_name} override: {config}') + render_data.update(config) + hammerdb_config = self.__environment_variables_dict.get('hammerdb_config', {}) if hammerdb_config and ('hammerdb' in self.__workload_name or 'winmssql' in self.__workload_name): - hammerdb_config = {k: v for k, v in hammerdb_config.items() if v != ''} - unknown_keys = set(hammerdb_config.keys()) - set(render_data.keys()) - if unknown_keys: - logger.warning(f'HAMMERDB_CONFIG unknown keys (will be ignored): {unknown_keys}') - logger.info(f'HAMMERDB_CONFIG override: {hammerdb_config}') - render_data.update(hammerdb_config) + self.__apply_config_override(hammerdb_config, 'HAMMERDB_CONFIG', render_data) workload_config = self.__environment_variables_dict.get('workload_config', {}) - if workload_config: - workload_config = {k: v for k, v in workload_config.items() if v != ''} - unknown_keys = set(workload_config.keys()) - set(render_data.keys()) - if unknown_keys: - logger.warning(f'WORKLOAD_CONFIG unknown keys (will be ignored): {unknown_keys}') - logger.info(f'WORKLOAD_CONFIG override: {workload_config}') - render_data.update(workload_config) + if workload_config: + self.__apply_config_override(workload_config, 'WORKLOAD_CONFIG', render_data)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/common/template_operations/template_operations.py` around lines 156 - 173, The new workload_config handling duplicates the existing hammerdb_config override flow in TemplateOperations, so extract the shared “filter empty values, detect unknown keys, log, and merge into render_data” behavior into a private helper. Reuse that helper from the hammerdb_config and workload_config branches in template_operations.py so both paths stay consistent and future changes only need to be made once. Keep the workload_name condition for hammerdb_config in place, and preserve the existing logger.warning/logger.info behavior through the shared helper.benchmark_runner/common/template_operations/templates/winstress/internal_data/winstress_vm_template.yaml (1)
4-12: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRepeated name-computation logic across the template (DRY).
The
{% if scale %}...{% else %}...{% endif %}pattern for VM name/labels and root-disk name is duplicated across 4+ locations. Consider computing these once via Jinja{% set %}at the top and reusing the variables, reducing risk of divergence on future edits.♻️ Example refactor
metadata: - {% if scale -%} - name: winstress-{{ kind }}-{{ trunc_uuid }}-{{ scale }} - labels: - kubevirt.io/vm: winstress-{{ kind }}-{{ trunc_uuid }}-{{ scale }} - {%- else -%} - name: winstress-{{ kind }}-{{ trunc_uuid }} - labels: - kubevirt.io/vm: winstress-{{ kind }}-{{ trunc_uuid }} - {%- endif %} + {%- set vm_name = 'winstress-' ~ kind ~ '-' ~ trunc_uuid ~ (('-' ~ scale) if scale else '') -%} + {%- set root_disk_name = 'winstress-' ~ kind ~ '-root-disk-' ~ trunc_uuid ~ (('-' ~ scale) if scale else '') -%} + name: {{ vm_name }} + labels: + kubevirt.io/vm: {{ vm_name }}Then reference
{{ vm_name }}/{{ root_disk_name }}at the other occurrences (labels, disks, volumes, dataVolumeTemplates).Also applies to: 23-27, 53-57, 113-122, 135-139
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/common/template_operations/templates/winstress/internal_data/winstress_vm_template.yaml` around lines 4 - 12, The VM name/label logic in the template is duplicated in multiple places and should be centralized for consistency. In winstress_vm_template.yaml, add Jinja {% set %} variables near the top for the computed VM name and root disk name, then replace the repeated {% if scale %}...{% else %}...{% endif %} blocks in the template sections with those shared variables (for labels, disks, volumes, and dataVolumeTemplates). Use the existing template symbols like kind, trunc_uuid, and scale to build vm_name and root_disk_name once, then reference them everywhere else.benchmark_runner/workloads/linstress_vm.py (1)
53-53: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRemove unnecessary f-string prefix.
Ruff flags
F541: the string has no placeholders.- check_cmd = f'test -f /tmp/stress_report.json && echo found' + check_cmd = 'test -f /tmp/stress_report.json && echo found'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/workloads/linstress_vm.py` at line 53, The check_cmd assignment in linstress_vm.py is using an unnecessary f-string prefix even though there are no placeholders. Update the check_cmd string literal in the workload setup code to a plain string so it no longer triggers Ruff F541, keeping the same shell command content.Source: Linters/SAST tools
benchmark_runner/main/environment_variables.py (1)
107-128: 📐 Maintainability & Code Quality | 🔵 TrivialDocument new workloads in help text/README.
winstress_vmandlinstress_vmwere added toenvironment_variables_dict['workloads'], but the workload list documented in the coding guidelines' README content doesn't mention them yet. As per coding guidelines,**/*.py: "Document supported workload types in code comments or help text for WORKLOAD parameter (workloads: stressng_pod, stressng_vm, uperf_pod, uperf_vm, sysbench_pod, sysbench_vm, hammerdb variants, vdbench variants, fio variants, winfio_vm, bootstorm_vm, windows_vm, winmssql_vm, clusterbuster)" — please update the corresponding documentation to include the two new workload names.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/main/environment_variables.py` around lines 107 - 128, The workload list in the help text/README is missing the newly supported `winstress_vm` and `linstress_vm` entries added in `EnvironmentVariables`; update the documentation that describes the WORKLOAD parameter so it matches the values in `environment_variables_dict['workloads']`. Locate the workload documentation section referenced by the coding guidelines and add these two names alongside the existing workload variants, keeping the list consistent with the implementation.Source: Coding guidelines
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/winstress_vm.yaml (1)
96-102: 🔒 Security & Privacy | 🔵 Trivial | ⚖️ Poor tradeoffTemporary
Everyone:Fgrant on SSH key directory.Line 100 grants
Everyone:Fon.sshandauthorized_keys(needed to take ownership/delete a pre-existing file) before line 101 restores restrictive ACLs. This creates a brief window where any local principal can read/write the SSH authorized_keys file during boot. Consider using a SYSTEM-owned/administrator-only takeover approach (e.g.,takeownas SYSTEM without theEveryone:Fgrant) to avoid the permissive window.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/winstress_vm.yaml` around lines 96 - 102, The `winstress_vm.yaml` cloud-init `runcmd` block briefly opens `.ssh` and `authorized_keys` to `Everyone:F`, which should be removed. Update the `userData` setup steps so the existing file can be taken over and deleted using a SYSTEM/Administrator-only approach in the same `runcmd` sequence, then keep the final restrictive ACL restoration in place. Reference the `userData` SSH setup commands to locate the temporary permission grant and replace it with a non-permissive ownership reset flow.tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml (1)
126-129: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueRedundant package install.
dnf install -y python3-psutil || trueduplicates thepackages: - python3-psutilcloud-config directive above; if intentional as a fallback for module-ordering issues, a brief comment would help future readers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml` around lines 126 - 129, The runcmd in the linstress_vm.yaml template is redundantly reinstalling python3-psutil even though it is already declared in the packages section. Update the runcmd block in the linstress_vm template to remove the duplicate dnf install step, or keep it only if truly needed as a fallback and add a brief inline comment explaining why. Use the runcmd and packages entries in func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml`:
- Around line 78-96: The monitoring loop in the linstress template is
oversleeping for short runs because `intervals = max(1, duration // 30)` always
forces at least one 30-second wait, so `burn_cpu` and `burn_memory` can finish
before sampling starts. Update the loop in the monitoring section to use
elapsed-time tracking against the requested `duration` instead of a fixed
interval count, and only sleep for the remaining time needed before each sample.
Keep the sampling logic and `cpu_procs`/`mem_proc` joins in place, but make sure
the first sample is taken within the actual stress window even when `duration <
30`.
In
`@benchmark_runner/common/template_operations/templates/winstress/internal_data/01_run_stress_template.ps1`:
- Line 6: The $reportFile variable in 01_run_stress_template.ps1 is assigned but
never reused, while the report path is hardcoded again later in the script.
Update the logic around $reportFile so the same variable is used wherever the
stress report path is needed, including the report-writing and report-reading
spots in the script, and remove the duplicated literal path to keep it
consistent with $stressDir.
In
`@benchmark_runner/common/template_operations/templates/winstress/internal_data/windows_dv_template.yaml`:
- Line 6: The DataVolume name in the Windows clone template is hardcoded and
shared across multiple Windows workloads, which can cause collisions during
concurrent runs. Update the name in windows_dv_template.yaml to be unique per
workload or per run by incorporating a suffix such as the existing template
uuid/identifier, and make sure the Windows clone path stays consistent in the
template that defines the DataVolume object.
In `@benchmark_runner/workloads/linstress_vm.py`:
- Around line 35-43: _remove the per-VM error-log call from the multiprocessing
phase helpers and let the top-level run() failure handling own it. In
_create_vm, _collect_results, and _delete_vm, avoid calling
self.save_error_logs() inside the child-process path; instead, just propagate
the exception and keep failure logging centralized in run()’s except block,
which already runs in the main process. Use the existing _run_parallel_phases
flow and the BootstormVM-style pattern as the reference for locating these step
functions._
- Around line 64-70: The SCP/report path in linstress_vm’s result handling is
swallowing failures by only logging a warning and returning from _scp_file
handling, which lets _upload_results skip failed VMs and still complete
successfully. Update the failure path around the stress report fetch in
linstress_vm to track missing-result/SCP failures per VM, and have the workload
lifecycle logic raise an exception or otherwise mark the run failed when no
usable results are produced. Make sure the fix is wired through the existing
_upload_results flow so repeated VM failures are not silently ignored.
- Around line 54-62: The polling loop in linstress_vm is using a range counter
with OC.DELAY, but the virtctl_ssh call can take much longer than that, so the
real wait can exceed self._timeout. Update the timeout logic in the stress
report check loop to use a wall-clock deadline or elapsed-time tracking around
virtctl_ssh and time.sleep, so the loop exits once the overall timeout is
reached. Keep the change localized to the polling block that calls
self._virtctl.virtctl_ssh and logs the “Waiting for stress_report.json” message.
In `@benchmark_runner/workloads/winstress_vm.py`:
- Around line 65-91: The failure handling in `_prepare_vm` and `_run_stress` is
inconsistent with `_create_vm`, `_collect_results`, and `_delete_vm`, because
these methods only log warnings and swallow exceptions. Update `_prepare_vm` and
`_run_stress` in `winstress_vm.py` to propagate failures by re-raising after
logging, so a VM that never becomes SSH-ready or fails during stress does not
continue into later phases and produce misleading results.
- Around line 98-106: The wait loop in winstress_vm’s stress_report.json polling
uses a fixed iteration count instead of real elapsed time, so the combined
virtctl_ssh timeout and sleep can exceed the intended self._timeout budget.
Update the polling in the method that builds check_cmd and calls
virtctl.virtctl_ssh to track wall-clock elapsed time (or a deadline) and stop
once the total time reaches self._timeout, keeping the per-check SSH timeout
bounded as well. Ensure the logger message and break condition still reflect the
same stress_report.json check logic.
- Around line 57-63: The worker helper error handlers in _create_vm(),
_collect_results(), and _delete_vm() are doing failure logging via
save_error_logs(), which should not run inside multiprocessing.Process workers.
Remove the save_error_logs() call from those except blocks and keep them only
re-raising the exception, then update run() to perform the error logging after
_run_parallel_phases() returns so the Elasticsearch upload happens in the parent
process.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml`:
- Around line 8-12: The cloud-init settings in the linstress VM template use a
hardcoded `user`/`password` pair with `ssh_pwauth` enabled, which should be
removed from this benchmark fixture. Update the `linstress_vm.yaml` template to
stop using fixed credentials in this cloud-config block and instead follow the
same safer approach used by the Windows workload (prefer SSH public key
injection and disable password auth), or at minimum generate a per-run random
password rather than a literal default.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml`:
- Around line 8-12: The cloud-init rendered by the linstress VM template is
using hardcoded weak credentials and enabling SSH password login. Update the
linstress cloud-init/template that produces linstress_vm.yaml so it no longer
sets a fixed user/password pair and does not enable ssh_pwauth; switch the
template to key-based auth only or generate a unique per-run password. Use the
cloud-init generation path associated with linstress_vm and the template/data
source that fills user, password, and ssh_pwauth to locate the fix.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml`:
- Around line 6-12: The VM userdata in the linstress template hardcodes the
fedora account password, keeps it non-expiring, and enables password SSH auth.
Update the cloud-config in the linstress VM template so it relies on the SSH key
flow used by linstress_vm.py, and remove or template the password-related fields
and ssh_pwauth setting instead of shipping fixed credentials.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_False/linstress_vm.yaml`:
- Around line 9-12: The cloud-init template for the linstress VM is using
hardcoded weak SSH credentials and password auth, which should be removed for
benchmark_runner artifacts. Update the template behind the linstress_vm.yaml
asset so the user/password values are not static (for example, inject a per-run
secret through the data template or switch the VM to SSH key auth), and keep the
relevant cloud-init fields such as ssh_pwauth and chpasswd aligned with that
change.
---
Nitpick comments:
In `@benchmark_runner/common/template_operations/template_operations.py`:
- Around line 156-173: The new workload_config handling duplicates the existing
hammerdb_config override flow in TemplateOperations, so extract the shared
“filter empty values, detect unknown keys, log, and merge into render_data”
behavior into a private helper. Reuse that helper from the hammerdb_config and
workload_config branches in template_operations.py so both paths stay consistent
and future changes only need to be made once. Keep the workload_name condition
for hammerdb_config in place, and preserve the existing
logger.warning/logger.info behavior through the shared helper.
In
`@benchmark_runner/common/template_operations/templates/winstress/internal_data/winstress_vm_template.yaml`:
- Around line 4-12: The VM name/label logic in the template is duplicated in
multiple places and should be centralized for consistency. In
winstress_vm_template.yaml, add Jinja {% set %} variables near the top for the
computed VM name and root disk name, then replace the repeated {% if scale
%}...{% else %}...{% endif %} blocks in the template sections with those shared
variables (for labels, disks, volumes, and dataVolumeTemplates). Use the
existing template symbols like kind, trunc_uuid, and scale to build vm_name and
root_disk_name once, then reference them everywhere else.
In `@benchmark_runner/main/environment_variables.py`:
- Around line 107-128: The workload list in the help text/README is missing the
newly supported `winstress_vm` and `linstress_vm` entries added in
`EnvironmentVariables`; update the documentation that describes the WORKLOAD
parameter so it matches the values in `environment_variables_dict['workloads']`.
Locate the workload documentation section referenced by the coding guidelines
and add these two names alongside the existing workload variants, keeping the
list consistent with the implementation.
In `@benchmark_runner/workloads/linstress_vm.py`:
- Line 53: The check_cmd assignment in linstress_vm.py is using an unnecessary
f-string prefix even though there are no placeholders. Update the check_cmd
string literal in the workload setup code to a plain string so it no longer
triggers Ruff F541, keeping the same shell command content.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/winstress_vm.yaml`:
- Around line 96-102: The `winstress_vm.yaml` cloud-init `runcmd` block briefly
opens `.ssh` and `authorized_keys` to `Everyone:F`, which should be removed.
Update the `userData` setup steps so the existing file can be taken over and
deleted using a SYSTEM/Administrator-only approach in the same `runcmd`
sequence, then keep the final restrictive ACL restoration in place. Reference
the `userData` SSH setup commands to locate the temporary permission grant and
replace it with a non-permissive ownership reset flow.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml`:
- Around line 126-129: The runcmd in the linstress_vm.yaml template is
redundantly reinstalling python3-psutil even though it is already declared in
the packages section. Update the runcmd block in the linstress_vm template to
remove the duplicate dnf install step, or keep it only if truly needed as a
fallback and add a brief inline comment explaining why. Use the runcmd and
packages entries in func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml to
locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5d3c4db4-ec96-4cf5-a93f-f2de947eb4b1
📒 Files selected for processing (64)
benchmark_runner/common/template_operations/template_operations.pybenchmark_runner/common/template_operations/templates/linstress/__init__.pybenchmark_runner/common/template_operations/templates/linstress/internal_data/__init__.pybenchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yamlbenchmark_runner/common/template_operations/templates/linstress/linstress_data_template.yamlbenchmark_runner/common/template_operations/templates/winstress/__init__.pybenchmark_runner/common/template_operations/templates/winstress/internal_data/01_run_stress_template.ps1benchmark_runner/common/template_operations/templates/winstress/internal_data/__init__.pybenchmark_runner/common/template_operations/templates/winstress/internal_data/windows_dv_template.yamlbenchmark_runner/common/template_operations/templates/winstress/internal_data/winstress_vm_template.yamlbenchmark_runner/common/template_operations/templates/winstress/winstress_data_template.yamlbenchmark_runner/main/environment_variables.pybenchmark_runner/workloads/linstress_vm.pybenchmark_runner/workloads/winstress_vm.pytests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_linstress_vm_ODF_PVC_False/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_linstress_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_linstress_vm_ODF_PVC_True/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_linstress_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/winstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_True/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_True/winstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_False/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_False/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_False/winstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_True/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_True/winstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_True/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_False/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_False/winstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_True/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_True/winstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_False/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_True/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_False/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_False/winstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_True/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_True/winstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_linstress_vm_ODF_PVC_False/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_linstress_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_linstress_vm_ODF_PVC_True/linstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_linstress_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_False/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_False/winstress_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_True/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_True/winstress_vm.yaml
| intervals = max(1, duration // 30) | ||
| samples = [] | ||
| for i in range(intervals): | ||
| time.sleep(30) | ||
| mem = psutil.virtual_memory() | ||
| cpu_pct = psutil.cpu_percent(interval=1) | ||
| sample = { | ||
| 'time_sec': (i+1)*30, | ||
| 'cpu_percent': cpu_pct, | ||
| 'mem_percent': mem.percent, | ||
| 'mem_used_gb': round(mem.used / (1024**3), 1), | ||
| 'mem_total_gb': round(mem.total / (1024**3), 1) | ||
| } | ||
| samples.append(sample) | ||
| print(f"At {sample['time_sec']}s: CPU={cpu_pct}% MEM={mem.percent}% ({sample['mem_used_gb']}GB/{sample['mem_total_gb']}GB)") | ||
|
|
||
| [p.join() for p in cpu_procs] | ||
| if mem_proc: | ||
| mem_proc.join() |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Monitoring loop can sleep past the actual stress duration.
intervals = max(1, duration // 30) forces at least one time.sleep(30) iteration even when duration < 30. In that case burn_cpu/burn_memory workers finish long before the monitor wakes up, so the captured CPU/memory samples reflect post-stress idle state rather than the actual stress window — producing misleading benchmark data for short-duration runs (a common case for quick/test_ci iterations).
🐛 Proposed fix using elapsed-time tracking instead of a fixed iteration count
- intervals = max(1, duration // 30)
- samples = []
- for i in range(intervals):
- time.sleep(30)
+ samples = []
+ elapsed_total = 0
+ while elapsed_total < duration:
+ step = min(30, duration - elapsed_total)
+ time.sleep(step)
+ elapsed_total += step
mem = psutil.virtual_memory()
cpu_pct = psutil.cpu_percent(interval=1)
sample = {
- 'time_sec': (i+1)*30,
+ 'time_sec': elapsed_total,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| intervals = max(1, duration // 30) | |
| samples = [] | |
| for i in range(intervals): | |
| time.sleep(30) | |
| mem = psutil.virtual_memory() | |
| cpu_pct = psutil.cpu_percent(interval=1) | |
| sample = { | |
| 'time_sec': (i+1)*30, | |
| 'cpu_percent': cpu_pct, | |
| 'mem_percent': mem.percent, | |
| 'mem_used_gb': round(mem.used / (1024**3), 1), | |
| 'mem_total_gb': round(mem.total / (1024**3), 1) | |
| } | |
| samples.append(sample) | |
| print(f"At {sample['time_sec']}s: CPU={cpu_pct}% MEM={mem.percent}% ({sample['mem_used_gb']}GB/{sample['mem_total_gb']}GB)") | |
| [p.join() for p in cpu_procs] | |
| if mem_proc: | |
| mem_proc.join() | |
| samples = [] | |
| elapsed_total = 0 | |
| while elapsed_total < duration: | |
| step = min(30, duration - elapsed_total) | |
| time.sleep(step) | |
| elapsed_total += step | |
| mem = psutil.virtual_memory() | |
| cpu_pct = psutil.cpu_percent(interval=1) | |
| sample = { | |
| 'time_sec': elapsed_total, | |
| 'cpu_percent': cpu_pct, | |
| 'mem_percent': mem.percent, | |
| 'mem_used_gb': round(mem.used / (1024**3), 1), | |
| 'mem_total_gb': round(mem.total / (1024**3), 1) | |
| } | |
| samples.append(sample) | |
| print(f"At {sample['time_sec']}s: CPU={cpu_pct}% MEM={mem.percent}% ({sample['mem_used_gb']}GB/{sample['mem_total_gb']}GB)") | |
| [p.join() for p in cpu_procs] | |
| if mem_proc: | |
| mem_proc.join() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml`
around lines 78 - 96, The monitoring loop in the linstress template is
oversleeping for short runs because `intervals = max(1, duration // 30)` always
forces at least one 30-second wait, so `burn_cpu` and `burn_memory` can finish
before sampling starts. Update the loop in the monitoring section to use
elapsed-time tracking against the requested `duration` instead of a fixed
interval count, and only sleep for the remaining time needed before each sample.
Keep the sampling logic and `cpu_procs`/`mem_proc` joins in place, but make sure
the first sample is taken within the actual stress window even when `duration <
30`.
| $stressDir = 'C:\tools\stress' | ||
| $pythonExe = 'C:\Program Files\Python312\python.exe' | ||
| $stressScript = "$stressDir\stress.py" | ||
| $reportFile = "$stressDir\stress_report.json" |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Unused $reportFile variable; report path duplicated instead of reused.
Static analysis flags $reportFile as assigned but never used. The literal path C:\tools\stress\stress_report.json is hardcoded again at Lines 114 and 116, risking drift if $stressDir changes.
♻️ Proposed fix
- with open(r'C:\tools\stress\stress_report.json', 'w') as f:
+ with open(r'$reportFile', 'w') as f:
json.dump(report, f, indent=2)
- print('Report saved to C:\\tools\\stress\\stress_report.json')
+ print('Report saved to $reportFile')Also applies to: 114-116
🧰 Tools
🪛 PSScriptAnalyzer (1.25.0)
[warning] 6-6: The variable 'reportFile' is assigned but never used.
(PSUseDeclaredVarsMoreThanAssignments)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@benchmark_runner/common/template_operations/templates/winstress/internal_data/01_run_stress_template.ps1`
at line 6, The $reportFile variable in 01_run_stress_template.ps1 is assigned
but never reused, while the report path is hardcoded again later in the script.
Update the logic around $reportFile so the same variable is used wherever the
stress report path is needed, including the report-writing and report-reading
spots in the script, and remove the duplicated literal path to keep it
consistent with $stressDir.
Source: Linters/SAST tools
| metadata: | ||
| annotations: | ||
| cdi.kubevirt.io/storage.deleteAfterCompletion: "false" | ||
| name: windows-clone-dv |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether another workload template defines a DV with the same name
rg -n "windows-clone-dv" benchmark_runner/common/template_operations/templatesRepository: redhat-performance/benchmark-runner
Length of output: 1243
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the relevant Windows templates for uniqueness/uuid usage
sed -n '1,220p' benchmark_runner/common/template_operations/templates/windows/internal_data/windows_dv_template.yaml
printf '\n---\n'
sed -n '1,240p' benchmark_runner/common/template_operations/templates/windows/internal_data/windows_vm_template.yaml
printf '\n---\n'
sed -n '1,220p' benchmark_runner/common/template_operations/templates/winstress/internal_data/windows_dv_template.yaml
printf '\n---\n'
sed -n '1,260p' benchmark_runner/common/template_operations/templates/winstress/internal_data/winstress_vm_template.yaml
printf '\n---\n'
sed -n '1,220p' benchmark_runner/common/template_operations/templates/winmssql/internal_data/windows_dv_template.yaml
printf '\n---\n'
sed -n '1,260p' benchmark_runner/common/template_operations/templates/winmssql/internal_data/winmssql_vm_template.yaml
printf '\n---\n'
sed -n '1,220p' benchmark_runner/common/template_operations/templates/winfio/internal_data/windows_dv_template.yaml
printf '\n---\n'
sed -n '1,260p' benchmark_runner/common/template_operations/templates/winfio/internal_data/winfio_vm_template.yaml
# Search for uuid usage in Windows templates
printf '\n=== uuid hits ===\n'
rg -n "uuid|windows-clone-dv" benchmark_runner/common/template_operations/templates/{windows,winstress,winmssql,winfio} -g '*.yaml'Repository: redhat-performance/benchmark-runner
Length of output: 37077
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect Windows workload data templates for URL/uuid and whether the clone DV is intended to be shared
sed -n '1,220p' benchmark_runner/common/template_operations/templates/windows/windows_data_template.yaml
printf '\n---\n'
sed -n '1,220p' benchmark_runner/common/template_operations/templates/winstress/winstress_data_template.yaml
printf '\n---\n'
sed -n '1,220p' benchmark_runner/common/template_operations/templates/winmssql/winmssql_data_template.yaml
printf '\n---\n'
sed -n '1,220p' benchmark_runner/common/template_operations/templates/winfio/winfio_data_template.yaml
printf '\n=== references to windows-clone-dv in Python code ===\n'
rg -n "windows-clone-dv|dataVolumeTemplates|source:\s*$|source:\n\s*pvc:" benchmark_runner -g '*.py' -g '*.yaml'
printf '\n=== any cleanup/delete logic for windows clone DV ===\n'
rg -n "deleteAfterCompletion|delete.*DataVolume|DataVolume.*delete|windows-clone" benchmark_runner -g '*.py' -g '*.yaml'Repository: redhat-performance/benchmark-runner
Length of output: 3365
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check whether namespace is run-specific or shared, and whether clone DV names are reused intentionally
rg -n "namespace:|name: windows-clone-dv|deleteAfterCompletion|uuid: \{\{ uuid \}\}" benchmark_runner/common/template_operations/templates -g '*.yaml' -g '*.py' -g '*.json'
printf '\n=== common template data / namespace sources ===\n'
sed -n '1,220p' benchmark_runner/common/template_operations/templates/common_template.yaml
printf '\n---\n'
rg -n "namespace" benchmark_runner/main benchmark_runner/common/template_operations -g '*.py' -g '*.yaml'Repository: redhat-performance/benchmark-runner
Length of output: 32316
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect workload namespace grouping for the Windows workloads
sed -n '120,220p' benchmark_runner/main/environment_variables.py
printf '\n---\n'
sed -n '430,470p' benchmark_runner/main/environment_variables.py
printf '\n=== windows-related workload names ===\n'
rg -n "\"(windows|winstress|winmssql|winfio)[^\"]*\"" benchmark_runner/main/environment_variables.pyRepository: redhat-performance/benchmark-runner
Length of output: 8725
Make the Windows clone DataVolume name unique
windows-clone-dv is shared by windows, winstress, winmssql, and winfio in the same benchmark-runner namespace, so concurrent Windows runs will contend for the same DataVolume object and can fail. Add a workload/run suffix such as {{ uuid }} or otherwise make this name unique.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@benchmark_runner/common/template_operations/templates/winstress/internal_data/windows_dv_template.yaml`
at line 6, The DataVolume name in the Windows clone template is hardcoded and
shared across multiple Windows workloads, which can cause collisions during
concurrent runs. Update the name in windows_dv_template.yaml to be unique per
workload or per run by incorporating a suffix such as the existing template
uuid/identifier, and make sure the Windows clone path stays consistent in the
template that defines the DataVolume object.
| def _create_vm(self, vm_num: str): | ||
| try: | ||
| vm_name = self._get_vm_name(vm_num) | ||
| self._oc.create_async(yaml=self._get_vm_yaml(vm_num)) | ||
| self._oc.wait_for_vm_create(vm_name=vm_name) | ||
| logger.info(f'VM {vm_name} created, stress script running via cloud-init') | ||
| except Exception as err: | ||
| self.save_error_logs() | ||
| raise err |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win
Don't call save_error_logs() from per-VM step functions run via _run_parallel_phases.
_create_vm, _collect_results, and _delete_vm are dispatched as parallel phases (likely multiprocessing child processes, same pattern as BootstormVM). Calling self.save_error_logs() from within these child-process functions risks the documented OpenSSL/SIGSEGV crash across fork boundaries. run()'s own except block (lines 170-172) already handles failure logging after phases complete, in the main process — the per-step calls are redundant and unsafe.
Based on learnings, "avoid calling _save_error_logs() from within multiprocessing child processes... this can trigger the documented SSL/SIGSEGV multiprocessing crash," and the recommended pattern is to do all HTTPS work in the main process after child processes complete or entirely within child processes consistently — not scattered ad hoc in both.
🔧 Proposed fix: drop per-step error-log calls, let `run()` handle it
def _create_vm(self, vm_num: str):
try:
vm_name = self._get_vm_name(vm_num)
self._oc.create_async(yaml=self._get_vm_yaml(vm_num))
self._oc.wait_for_vm_create(vm_name=vm_name)
logger.info(f'VM {vm_name} created, stress script running via cloud-init')
except Exception as err:
- self.save_error_logs()
raise err except Exception as err:
- self.save_error_logs()
raise err
def _delete_vm(self, vm_num: str):
try:
vm_name = self._get_vm_name(vm_num)
self._oc.delete_vm_sync(yaml=self._get_vm_yaml(vm_num), vm_name=vm_name)
except Exception as err:
- self.save_error_logs()
raise errAlso applies to: 103-105, 107-113
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@benchmark_runner/workloads/linstress_vm.py` around lines 35 - 43, _remove the
per-VM error-log call from the multiprocessing phase helpers and let the
top-level run() failure handling own it. In _create_vm, _collect_results, and
_delete_vm, avoid calling self.save_error_logs() inside the child-process path;
instead, just propagate the exception and keep failure logging centralized in
run()’s except block, which already runs in the main process. Use the existing
_run_parallel_phases flow and the BootstormVM-style pattern as the reference for
locating these step functions._
Source: Learnings
| for elapsed in range(0, self._timeout, OC.DELAY): | ||
| check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd, | ||
| namespace=self.__namespace, key_path=self.__ssh_key_path, | ||
| username=self.__username, timeout=60) | ||
| if check and 'found' in check: | ||
| break | ||
| if elapsed > 0 and elapsed % 300 == 0: | ||
| logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)') | ||
| time.sleep(OC.DELAY) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Polling loop can overrun the declared timeout.
Each iteration performs an SSH call (up to 60s) and then sleeps OC.DELAY, but the loop counter advances only by OC.DELAY per iteration — so if SSH calls are slow, actual wall-clock wait can far exceed self._timeout. As per path instructions, timeout handling for workloads should be checked carefully; here the bound is effectively iterations * (ssh_call_time + OC.DELAY) rather than self._timeout.
⏱️ Proposed fix: use wall-clock deadline instead of loop-counter
- for elapsed in range(0, self._timeout, OC.DELAY):
- check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd,
- namespace=self.__namespace, key_path=self.__ssh_key_path,
- username=self.__username, timeout=60)
- if check and 'found' in check:
- break
- if elapsed > 0 and elapsed % 300 == 0:
- logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)')
- time.sleep(OC.DELAY)
+ deadline = time.time() + self._timeout
+ while time.time() < deadline:
+ check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd,
+ namespace=self.__namespace, key_path=self.__ssh_key_path,
+ username=self.__username, timeout=60)
+ if check and 'found' in check:
+ break
+ logger.info(f'Waiting for stress_report.json on VM {vm_name}...')
+ time.sleep(OC.DELAY)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for elapsed in range(0, self._timeout, OC.DELAY): | |
| check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd, | |
| namespace=self.__namespace, key_path=self.__ssh_key_path, | |
| username=self.__username, timeout=60) | |
| if check and 'found' in check: | |
| break | |
| if elapsed > 0 and elapsed % 300 == 0: | |
| logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)') | |
| time.sleep(OC.DELAY) | |
| deadline = time.time() + self._timeout | |
| while time.time() < deadline: | |
| check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd, | |
| namespace=self.__namespace, key_path=self.__ssh_key_path, | |
| username=self.__username, timeout=60) | |
| if check and 'found' in check: | |
| break | |
| logger.info(f'Waiting for stress_report.json on VM {vm_name}...') | |
| time.sleep(OC.DELAY) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@benchmark_runner/workloads/linstress_vm.py` around lines 54 - 62, The polling
loop in linstress_vm is using a range counter with OC.DELAY, but the virtctl_ssh
call can take much longer than that, so the real wait can exceed self._timeout.
Update the timeout logic in the stress report check loop to use a wall-clock
deadline or elapsed-time tracking around virtctl_ssh and time.sleep, so the loop
exits once the overall timeout is reached. Keep the change localized to the
polling block that calls self._virtctl.virtctl_ssh and logs the “Waiting for
stress_report.json” message.
Source: Path instructions
| check_cmd = f'powershell -Command "if (Test-Path \'{self.__remote_dir}/stress_report.json\') {{ echo found }}"' | ||
| for elapsed in range(0, self._timeout, OC.DELAY): | ||
| check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd, | ||
| namespace=self.__namespace, key_path=self.__ssh_key_path, | ||
| username=self.__username, timeout=60) | ||
| if check and 'found' in check: | ||
| break | ||
| logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)') | ||
| time.sleep(OC.DELAY) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Wait loop can exceed the intended self._timeout budget.
The loop iterates self._timeout // OC.DELAY times, but each iteration also issues a virtctl_ssh check with its own independent timeout=60 plus time.sleep(OC.DELAY). If the SSH check is slow/hanging, real elapsed time is iterations * (up_to_60 + OC.DELAY), which can far exceed self._timeout — the loop bound only counts iterations, not actual elapsed wall-clock time.
🕒 Proposed fix using actual elapsed time
- check_cmd = f'powershell -Command "if (Test-Path \'{self.__remote_dir}/stress_report.json\') {{ echo found }}"'
- for elapsed in range(0, self._timeout, OC.DELAY):
+ check_cmd = f'powershell -Command "if (Test-Path \'{self.__remote_dir}/stress_report.json\') {{ echo found }}"'
+ deadline = time.time() + self._timeout
+ while time.time() < deadline:
check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd,
namespace=self.__namespace, key_path=self.__ssh_key_path,
username=self.__username, timeout=60)
if check and 'found' in check:
break
- logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)')
+ logger.info(f'Waiting for stress_report.json on VM {vm_name}...')
time.sleep(OC.DELAY)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| check_cmd = f'powershell -Command "if (Test-Path \'{self.__remote_dir}/stress_report.json\') {{ echo found }}"' | |
| for elapsed in range(0, self._timeout, OC.DELAY): | |
| check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd, | |
| namespace=self.__namespace, key_path=self.__ssh_key_path, | |
| username=self.__username, timeout=60) | |
| if check and 'found' in check: | |
| break | |
| logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)') | |
| time.sleep(OC.DELAY) | |
| check_cmd = f'powershell -Command "if (Test-Path \'{self.__remote_dir}/stress_report.json\') {{ echo found }}"' | |
| deadline = time.time() + self._timeout | |
| while time.time() < deadline: | |
| check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd, | |
| namespace=self.__namespace, key_path=self.__ssh_key_path, | |
| username=self.__username, timeout=60) | |
| if check and 'found' in check: | |
| break | |
| logger.info(f'Waiting for stress_report.json on VM {vm_name}...') | |
| time.sleep(OC.DELAY) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@benchmark_runner/workloads/winstress_vm.py` around lines 98 - 106, The wait
loop in winstress_vm’s stress_report.json polling uses a fixed iteration count
instead of real elapsed time, so the combined virtctl_ssh timeout and sleep can
exceed the intended self._timeout budget. Update the polling in the method that
builds check_cmd and calls virtctl.virtctl_ssh to track wall-clock elapsed time
(or a deadline) and stop once the total time reaches self._timeout, keeping the
per-check SSH timeout bounded as well. Ensure the logger message and break
condition still reflect the same stress_report.json check logic.
Source: Path instructions
| #cloud-config | ||
| user: fedora | ||
| password: fedora | ||
| chpasswd: { expire: False } | ||
| ssh_pwauth: true |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Hardcoded weak default credentials with password SSH auth enabled.
Cloud-init sets user: fedora / password: fedora with ssh_pwauth: true, enabling password-based SSH login with a well-known default credential pair. Recommend disabling ssh_pwauth in favor of injecting an SSH public key (as done for the Windows workload), or at minimum generating a random per-run password rather than a fixed literal.
As per coding guidelines, benchmark_runner path instructions state to "Flag any hardcoded credentials or secrets."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml`
around lines 8 - 12, The cloud-init settings in the linstress VM template use a
hardcoded `user`/`password` pair with `ssh_pwauth` enabled, which should be
removed from this benchmark fixture. Update the `linstress_vm.yaml` template to
stop using fixed credentials in this cloud-config block and instead follow the
same safer approach used by the Windows workload (prefer SSH public key
injection and disable password auth), or at minimum generate a per-run random
password rather than a literal default.
Source: Path instructions
| #cloud-config | ||
| user: fedora | ||
| password: fedora | ||
| chpasswd: { expire: False } | ||
| ssh_pwauth: true |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Hardcoded weak credentials + password SSH auth enabled.
The generated cloud-init sets user: fedora, password: fedora, and ssh_pwauth: true. This bakes a trivially guessable password into every VM instance with SSH password authentication enabled, exposing the VM to unauthorized access on the pod network. Since this is a rendered golden file, the root cause is the linstress data/cloud-init template — recommend switching to key-based auth only (disable ssh_pwauth) or generating a random per-run password.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml`
around lines 8 - 12, The cloud-init rendered by the linstress VM template is
using hardcoded weak credentials and enabling SSH password login. Update the
linstress cloud-init/template that produces linstress_vm.yaml so it no longer
sets a fixed user/password pair and does not enable ssh_pwauth; switch the
template to key-based auth only or generate a unique per-run password. Use the
cloud-init generation path associated with linstress_vm and the template/data
source that fills user, password, and ssh_pwauth to locate the fix.
Source: Path instructions
| stringData: | ||
| userdata: | | ||
| #cloud-config | ||
| user: fedora | ||
| password: fedora | ||
| chpasswd: { expire: False } | ||
| ssh_pwauth: true |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate linstress-related templates and workload code
git ls-files | rg 'linstress|Linstress|lin_stress'
echo '--- template files ---'
git ls-files | rg 'benchmark_runner/common/template_operations/templates/.+linstress|benchmark_runner/common/template_operations/templates/.+lin_stress'
echo '--- workload files ---'
git ls-files | rg 'benchmark_runner/workloads/.+linstress|benchmark_runner/workloads/.+lin_stress'
echo '--- tests for golden files ---'
git ls-files | rg 'tests/.+linstress|tests/.+golden_files/.+linstress'Repository: redhat-performance/benchmark-runner
Length of output: 3288
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- linstress_vm_template.yaml ---'
wc -l benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml
sed -n '1,220p' benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml | cat -n
echo '--- linstress_data_template.yaml ---'
wc -l benchmark_runner/common/template_operations/templates/linstress/linstress_data_template.yaml
sed -n '1,220p' benchmark_runner/common/template_operations/templates/linstress/linstress_data_template.yaml | cat -n
echo '--- linstress_vm.py ---'
wc -l benchmark_runner/workloads/linstress_vm.py
sed -n '1,260p' benchmark_runner/workloads/linstress_vm.py | cat -nRepository: redhat-performance/benchmark-runner
Length of output: 19449
Hardcoded VM credentials and password SSH auth
benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml:8-13 hardcodes fedora/fedora, leaves the password non-expiring, and enables ssh_pwauth. If the VM only needs the SSH key path used by linstress_vm.py, drop password login or make these values templated.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml`
around lines 6 - 12, The VM userdata in the linstress template hardcodes the
fedora account password, keeps it non-expiring, and enables password SSH auth.
Update the cloud-config in the linstress VM template so it relies on the SSH key
flow used by linstress_vm.py, and remove or template the password-related fields
and ssh_pwauth setting instead of shipping fixed credentials.
Source: Path instructions
| user: fedora | ||
| password: fedora | ||
| chpasswd: { expire: False } | ||
| ssh_pwauth: true |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Hardcoded weak default credentials for a network-exposed VM.
The cloud-init template sets user: fedora / password: fedora and enables ssh_pwauth: true, meaning every linstress_vm deployment boots with a publicly-known, static SSH password. Since this VM has SSH exposed via a masquerade interface, this is a real attack surface, especially since these test clusters can be long-lived.
Recommend generating a random password per-run (like the deadbeef suffix used for names) and injecting it via the data template, or switching to SSH-key-based auth instead of password auth.
As per path instructions, "Flag any hardcoded credentials or secrets" for benchmark_runner/**.
🔒 Suggested direction
- user: fedora
- password: fedora
- chpasswd: { expire: False }
- ssh_pwauth: true
+ user: fedora
+ ssh_pwauth: false
+ ssh_authorized_keys:
+ - "{{ linstress_ssh_public_key }}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_False/linstress_vm.yaml`
around lines 9 - 12, The cloud-init template for the linstress VM is using
hardcoded weak SSH credentials and password auth, which should be removed for
benchmark_runner artifacts. Update the template behind the linstress_vm.yaml
asset so the user/password values are not static (for example, inject a per-run
secret through the data template or switch the VM to SSH key auth), and keep the
relevant cloud-init fields such as ssh_pwauth and chpasswd aligned with that
change.
Source: Path instructions
Summary
winstress_vmworkload: Windows CPU+memory stress test with Python multiprocessing, supports scale via BootstormVM, configurablestress_cpu/stress_memory/stress_durationlinstress_vmworkload: Linux Fedora equivalent using cloud-init embedded stress script, supports scale via BootstormVM, same metrics for direct Windows/Linux comparisonWORKLOAD_CONFIGenv var to override template defaults (requests_cpu,requests_memory,sockets,cores,threads) for any workload at runtimepin_node0to pin logic for node selector supportTest plan
make golden_files— golden files regeneratedmake test_golden_files— golden file tests passPYTHONPATH=. python3 -m pytest -v tests/unittest/— all 26 unit tests passWORKLOAD=winstress_vmwith scale on Windows VMsWORKLOAD=linstress_vmwith scale on Fedora VMsWORKLOAD_CONFIGoverride works for different workloads🤖 Assisted-by: Claude Code
Summary by CodeRabbit